A Multichannel Feature Compensation Approach for Robust ASR in Noisy and Reverberant Environments
نویسندگان
چکیده
In this paper we propose a multichannel feature compensation approach for automatic speech recognition in reverberant and noisy environments. The proposed technique propagates the posterior of the clean signal estimated by a multichannel Wiener filter in short-time Fourier transform (STFT) domain into Mel-frequency cepstrum coefficients (MFCC) domain. The multichannel Wiener filter reduces both reverberation and additive noise. Furthermore, we approximate the propagation of the prior distributions of speech and interference through the inverse STFT and the STFT with different time-frequency resolutions. This allows us to derive a multichannel minimum mean square error MFCC estimator with an STFT resolution that is different from the resolution in the speech enhancement stage. The proposed approach is able to outperform a multichannel short-time spectral amplitude estimation approach on both the clean training and multi-condition training ASR tasks of the REVERB challenge.
منابع مشابه
A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
In this paper a multi-channel speech enhancement framework for distant speech acquisition in noisy and reverberant environments for Non-negative Matrix Factorization (NMF)-based Automatic Speech Recognition (ASR) is proposed. The system is evaluated for its use in an assistive vocal interface for physically impaired and speech-impaired users. The framework utilises the Spatially Pre-processed S...
متن کاملDeep neural network based spectral feature mapping for robust speech recognition
Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping from corrupted speech to clean speech. The DNN based mapping substantially reduces interference and produces estimated clean spectral features for ASR training and decoding. We expe...
متن کاملMulti-step linear prediction based speech dereverberation in noisy reverberant environment
A speech signal captured by a distant microphone is generally contaminated by reverberation and background noise, which severely degrade the automatic speech recognition (ASR) performance. In this paper, we first extend a previously proposed single channel dereverberation algorithm to a multi-channel scenario. The method estimates late reflections using multichannel multi-step linear prediction...
متن کاملSpeech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...
متن کاملTitle Placeholder
A speech signal captured by a distant microphone is generally contaminated by reverberation and background noise, which severely degrade the automatic speech recognition (ASR) performance. In this paper, we first extend a previously proposed single channel dereverberation algorithm to a multi-channel scenario. The method estimates late reflections using multichannel multi-step linear prediction...
متن کامل